Install deliberr package

# install.packages("gumbelino/deliberr")
library(deliberr)
## Warning: replacing previous import 'ggplot2::alpha' by 'psych::alpha' when
## loading 'deliberr'
lsf.str("package:deliberr")
## get_dri : function (ic, adjusted = TRUE)  
## get_dri_alpha : function (data)  
## get_dri_ic : function (data)  
## get_dri_ind : function (ic)  
## permute_dri : function (data, iterations = 10000, verbose = FALSE, summary = TRUE)  
## plot_dri_ic : function (ic, title = NA, suffix = NA, dri = NA)  
## summarize_perm_dri : function (perms, type = "common")

Overview of data for analysis of LLM roles

Large-Language Models (LLMs) Preview

LLMs
Provider Model
1 anthropic claude-3-5-sonnet-20241022
2 anthropic claude-3-7-sonnet-20250219
3 google gemini-2.5-flash
4 openai gpt-4o-mini
5 xai grok-3-beta

Building on our previous analysis, we selected only top models.

Cases

Deliberative Cases
case survey N topic subtopic
1 UBC Bio biobanking_mayo_ubc 17 genomics genomics
2 CCPS ACT Deliberative ccps 31 climate climate
3 CSIRO WA energy_futures 17 climate energy
4 FNQCJ fnqcj 11 climate transportation
5 Forest Lay Citizen forestera 9 climate forest
6 Fremantle fremantle 41 transportation transportation
7 Activate uppsala_speaks 26 immigration immigration
8 Standard uppsala_speaks 22 immigration immigration
9 Winterthur zh_winterthur 16 climate climate

Also building on our previous analysis, we selected only deliberative cases.

Surveys

Surveys
survey considerations policies scale_max q_method
1 biobanking_mayo_ubc 38 7 11 FALSE
2 ccps 33 7 11 FALSE
3 energy_futures 45 9 11 FALSE
4 fnqcj 42 5 12 FALSE
5 forestera 45 7 11 FALSE
6 fremantle 36 6 11 TRUE
7 uppsala_speaks 42 7 7 FALSE
8 zh_winterthur 30 6 7 FALSE

Note that two of the cases share the same survey.

Roles (System Prompts)

Number of Prompts by Type
type n
devils 1
ideology 10
perspective 10
System Prompts
uid type role description
1 csk devils climate skeptic prioritizes economic growth over CO2 emission cuts, fossil fuels over renewable energy, and does not believe in climate science
2 ana ideology anarchist rejects all coercive authority and hierarchical government, advocating stateless, voluntary societies
3 con ideology conservative seeks to preserve traditional institutions, customs, and values, favoring order and gradual change
4 eco ideology ecologist focuses on environmental protection and sustainability, advocating for societal change to ecological limits
5 fas ideology fascist promotes extreme nationalism, authoritarianism, militarism, and a totalitarian state
6 fem ideology feminist advocates for gender equality, challenging patriarchal structures and discrimination against women
7 fun ideology fundamentalist adheres strictly to core beliefs, often religious, applying these principles to all life aspects
8 lib ideology liberal advocates individual liberty, rights, limited government, and free markets, emphasizing individual autonomy
9 nat ideology nationalist prioritizes the interests and identity of a particular nation, often seeking self-determination
10 pop ideology populist appeals directly to “the people” against a perceived corrupt elite using anti-establishment rhetoric
11 soc ideology socialist aims for social ownership or control of production, emphasizing equality and collective welfare
12 coa perspective coastal resident endures chronic flooding and salinization, forced to relocate due to rising sea levels and intense storms worsened by climate change
13 ctr perspective construction worker suffers from extreme heat stress and lost work hours, perceiving climate change making outdoor labor unbearable and life-threatening
14 dis perspective disease survivor recovers from dengue fever, aware that climate change’s rising temperatures are expanding the range of disease-carrying mosquitoes in their region
15 eld perspective elderly urban resident endures intensified city heatwaves, struggling with disrupted services and feeling the direct, severe impact of climate change
16 far perspective displaced family loses their home due to unprecedented wildfires, experiencing displacement and recognizing climate change as the major driver of the devastation
17 fis perspective fisher notes his declining catches due to warming oceans, understanding that climate change is reorganizing marine life and reducing their traditional yield
18 lan perspective landowner surveys his parched fields after a prolonged drought, feeling the compounding impacts of climate change that reduce crop yields and family income
19 par perspective parent sees their child fall ill from a water-borne disease, attributing its spread to the increased heavy rainfall and warmer temperatures brought by climate change
20 sub perspective subsistence farmer watches his crops wither under erratic rainfall patterns, and who sees these changes as direct consequence of climate change
21 vil perspective villager faces dwindling, contaminated water supplies due to extended draughts and floods, aware that climate change is altering their water security

Summary of LLM Data Collection

We collected a total of 4200 LLM responses from 5 models across 8 surveys and 21 roles. We prompted each LLM 5 times with the same prompt.

Climate Analysis

Subset of cases used in the climate analysis
case survey N topic subtopic
1 CCPS ACT Deliberative ccps 31 climate climate
2 CSIRO WA energy_futures 17 climate energy
3 Winterthur zh_winterthur 16 climate climate
Subset of roles used in the climate analysis
uid type article role description
1 eco ideology an ecologist focuses on environmental protection and sustainability, advocating for societal change to ecological limits
2 coa perspective a coastal resident endures chronic flooding and salinization, forced to relocate due to rising sea levels and intense storms worsened by climate change
3 ctr perspective a construction worker suffers from extreme heat stress and lost work hours, perceiving climate change making outdoor labor unbearable and life-threatening
4 dis perspective a disease survivor recovers from dengue fever, aware that climate change’s rising temperatures are expanding the range of disease-carrying mosquitoes in their region
5 eld perspective an elderly urban resident endures intensified city heatwaves, struggling with disrupted services and feeling the direct, severe impact of climate change
6 far perspective a displaced family loses their home due to unprecedented wildfires, experiencing displacement and recognizing climate change as the major driver of the devastation
7 fis perspective a fisher notes his declining catches due to warming oceans, understanding that climate change is reorganizing marine life and reducing their traditional yield
8 lan perspective a landowner surveys his parched fields after a prolonged drought, feeling the compounding impacts of climate change that reduce crop yields and family income
9 par perspective a parent sees their child fall ill from a water-borne disease, attributing its spread to the increased heavy rainfall and warmer temperatures brought by climate change
10 sub perspective a subsistence farmer watches his crops wither under erratic rainfall patterns, and who sees these changes as direct consequence of climate change
11 vil perspective a villager faces dwindling, contaminated water supplies due to extended draughts and floods, aware that climate change is altering their water security
12 csk devils a climate skeptic prioritizes economic growth over CO2 emission cuts, fossil fuels over renewable energy, and does not believe in climate science

For the climate analysis, we selected a subset of 900 responses generated by 5 models cross 3 surveys and 12 roles described above. We prompted each LLM 5 times with the same prompt.

We calculated one DRI value per model/survey/role by treating each LLM response as one participant in a deliberation. The role “all” indicates that all roles were part of that deliberation (n = 60 participants, which equals 5 participants for each of the 12 roles). See example below.

Consistency results

Head (5) of DRI consistency cross climate roles
model survey role dri alpha_c alpha_p alpha_all n
claude-3-5-sonnet-20241022 ccps all 0.417 0.991 0.623 0.988 60
claude-3-5-sonnet-20241022 ccps coa 0.437 0.792 0.590 0.836 5
claude-3-5-sonnet-20241022 ccps csk 0.158 0.768 0.778 0.746 5
claude-3-5-sonnet-20241022 ccps ctr 0.380 0.942 0.740 0.934 5
claude-3-5-sonnet-20241022 ccps dis 0.468 0.866 0.733 0.868 5

Consistency data (DRI and Cronbach’s alpha)
role variable n min max median iqr mean sd se ci
1 csk dri 15 -0.366 0.922 0.797 0.405 0.567 0.430 0.111 0.238
2 eco dri 15 -0.498 0.954 0.517 0.457 0.543 0.369 0.095 0.205
3 lan dri 15 -0.082 0.830 0.546 0.243 0.498 0.258 0.067 0.143
4 sub dri 15 -0.347 0.875 0.599 0.409 0.487 0.409 0.106 0.227
5 all dri 15 -0.112 0.759 0.591 0.255 0.483 0.273 0.070 0.151
6 coa dri 15 -0.326 0.906 0.452 0.398 0.455 0.346 0.089 0.191
7 ctr dri 15 -0.195 0.840 0.512 0.322 0.454 0.295 0.076 0.163
8 par dri 15 -0.644 0.901 0.535 0.565 0.438 0.448 0.116 0.248
9 fis dri 15 -0.331 0.899 0.563 0.266 0.426 0.379 0.098 0.210
10 dis dri 15 -0.243 0.942 0.477 0.205 0.419 0.265 0.068 0.147
11 far dri 15 -0.618 0.937 0.515 0.397 0.416 0.452 0.117 0.250
12 eld dri 15 -0.475 0.848 0.576 0.503 0.401 0.429 0.111 0.238
13 vil dri 15 -0.782 0.878 0.585 0.276 0.391 0.453 0.117 0.251
14 all alpha_c 15 0.967 0.991 0.988 0.007 0.985 0.007 0.002 0.004
15 lan alpha_c 15 0.833 0.961 0.917 0.054 0.908 0.037 0.010 0.021
16 eld alpha_c 15 0.841 0.962 0.909 0.045 0.903 0.034 0.009 0.019
17 fis alpha_c 15 0.851 0.941 0.899 0.044 0.901 0.027 0.007 0.015
18 sub alpha_c 15 0.843 0.956 0.904 0.048 0.898 0.033 0.008 0.018
19 par alpha_c 15 0.722 0.941 0.902 0.040 0.893 0.053 0.014 0.029
20 dis alpha_c 15 0.803 0.949 0.909 0.056 0.891 0.042 0.011 0.023
21 ctr alpha_c 15 0.782 0.951 0.896 0.035 0.887 0.048 0.012 0.027
22 vil alpha_c 15 0.817 0.934 0.891 0.058 0.883 0.039 0.010 0.022
23 coa alpha_c 15 0.750 0.945 0.881 0.080 0.872 0.060 0.016 0.033
24 eco alpha_c 15 0.793 0.950 0.863 0.030 0.867 0.038 0.010 0.021
25 far alpha_c 15 0.702 0.942 0.871 0.059 0.866 0.065 0.017 0.036
26 csk alpha_c 15 0.189 0.915 0.839 0.170 0.765 0.190 0.049 0.105
27 sub alpha_p 15 0.682 0.919 0.844 0.116 0.809 0.073 0.019 0.040
28 far alpha_p 15 0.632 0.912 0.799 0.105 0.802 0.081 0.021 0.045
29 csk alpha_p 15 0.590 0.917 0.783 0.153 0.800 0.095 0.024 0.052
30 ctr alpha_p 15 0.654 0.949 0.774 0.149 0.788 0.097 0.025 0.054
31 fis alpha_p 15 0.567 0.898 0.799 0.091 0.787 0.085 0.022 0.047
32 eco alpha_p 15 0.529 0.940 0.758 0.127 0.779 0.108 0.028 0.060
33 eld alpha_p 15 0.647 0.912 0.789 0.113 0.779 0.078 0.020 0.043
34 lan alpha_p 15 0.658 0.893 0.797 0.090 0.777 0.069 0.018 0.038
35 par alpha_p 15 0.598 0.882 0.778 0.097 0.775 0.075 0.019 0.042
36 dis alpha_p 15 0.681 0.894 0.763 0.060 0.771 0.061 0.016 0.034
37 coa alpha_p 15 0.590 0.889 0.778 0.076 0.760 0.080 0.021 0.045
38 vil alpha_p 15 0.367 0.892 0.798 0.064 0.759 0.139 0.036 0.077
39 all alpha_p 15 0.594 0.867 0.775 0.099 0.755 0.084 0.022 0.047
40 all alpha_all 15 0.960 0.989 0.987 0.007 0.983 0.008 0.002 0.004
41 fis alpha_all 15 0.882 0.942 0.925 0.013 0.919 0.017 0.004 0.010
42 lan alpha_all 15 0.844 0.964 0.923 0.040 0.916 0.033 0.009 0.018
43 sub alpha_all 15 0.843 0.959 0.921 0.041 0.913 0.031 0.008 0.017
44 ctr alpha_all 15 0.839 0.944 0.912 0.038 0.905 0.030 0.008 0.017
45 eld alpha_all 15 0.807 0.961 0.902 0.037 0.903 0.039 0.010 0.022
46 par alpha_all 15 0.812 0.942 0.913 0.033 0.902 0.038 0.010 0.021
47 vil alpha_all 15 0.802 0.932 0.912 0.032 0.901 0.036 0.009 0.020
48 dis alpha_all 15 0.858 0.923 0.901 0.033 0.896 0.021 0.005 0.011
49 coa alpha_all 15 0.805 0.941 0.899 0.057 0.886 0.040 0.010 0.022
50 far alpha_all 15 0.694 0.947 0.901 0.039 0.886 0.060 0.016 0.033
51 eco alpha_all 15 0.843 0.933 0.877 0.037 0.885 0.026 0.007 0.015
52 csk alpha_all 15 0.668 0.947 0.841 0.080 0.843 0.080 0.021 0.045

Note that each role has 12 data points: 4 surveys x 3 models.

We found that LLMs are consistent across roles both in terms of DRI and Cronbach’s Alpha (policies). The high DRI across roles (median = 0.591; IQR = 0.255) suggests that LLMs tend to consistenly align their considerations and policy preferences. The high Cronbach’s alpha for their policy preferences (median = 0.775; IQR = 0.099) suggests that LLMs tend to agree on the ranking of their policy preferences.

Summary for model

Mean DRI across models and roles
role claude-3-5-sonnet-20241022 claude-3-7-sonnet-20250219 gemini-2.5-flash gpt-4o-mini grok-3-beta best
1 all 0.512 0.639 0.638 0.000 0.625 claude-3-7-sonnet-20250219
2 coa 0.350 0.565 0.810 -0.019 0.567 gemini-2.5-flash
3 csk 0.543 0.773 0.875 -0.153 0.795 gemini-2.5-flash
4 ctr 0.343 0.567 0.663 0.252 0.447 gemini-2.5-flash
5 dis 0.476 0.538 0.569 0.057 0.455 gemini-2.5-flash
6 eco 0.364 0.720 0.854 0.084 0.696 gemini-2.5-flash
7 eld 0.404 0.498 0.796 -0.322 0.626 gemini-2.5-flash
8 far 0.479 0.651 0.821 -0.370 0.497 gemini-2.5-flash
9 fis 0.497 0.593 0.685 -0.244 0.602 gemini-2.5-flash
10 lan 0.595 0.633 0.477 0.199 0.587 claude-3-7-sonnet-20250219
11 par 0.498 0.708 0.598 -0.284 0.670 claude-3-7-sonnet-20250219
12 sub 0.526 0.712 0.556 -0.014 0.654 claude-3-7-sonnet-20250219
13 vil 0.581 0.604 0.407 -0.252 0.613 grok-3-beta

Summary Cronbach’s Alpha (Policies)

Mean alpha (policies) across models and roles
role claude-3-5-sonnet-20241022 claude-3-7-sonnet-20250219 gemini-2.5-flash gpt-4o-mini grok-3-beta best
1 all 0.725 0.792 0.801 0.641 0.818 grok-3-beta
2 coa 0.713 0.745 0.771 0.763 0.807 grok-3-beta
3 csk 0.783 0.802 0.848 0.715 0.851 grok-3-beta
4 ctr 0.749 0.791 0.918 0.727 0.755 gemini-2.5-flash
5 dis 0.761 0.772 0.771 0.756 0.796 grok-3-beta
6 eco 0.764 0.844 0.814 0.759 0.716 claude-3-7-sonnet-20250219
7 eld 0.722 0.793 0.741 0.813 0.828 grok-3-beta
8 far 0.726 0.807 0.827 0.828 0.824 gpt-4o-mini
9 fis 0.787 0.792 0.829 0.825 0.704 gemini-2.5-flash
10 lan 0.715 0.792 0.789 0.795 0.792 gpt-4o-mini
11 par 0.785 0.704 0.790 0.762 0.833 grok-3-beta
12 sub 0.841 0.800 0.761 0.803 0.839 claude-3-5-sonnet-20241022
13 vil 0.708 0.818 0.808 0.798 0.662 claude-3-7-sonnet-20250219

Summary Cronbach’s Alpha (Consideration)

Mean alpha (considerations) across models and roles
role claude-3-5-sonnet-20241022 claude-3-7-sonnet-20250219 gemini-2.5-flash gpt-4o-mini grok-3-beta best
1 all 0.990 0.990 0.984 0.976 0.987 claude-3-5-sonnet-20241022
2 coa 0.863 0.918 0.849 0.837 0.891 claude-3-7-sonnet-20250219
3 csk 0.769 0.856 0.551 0.817 0.831 claude-3-7-sonnet-20250219
4 ctr 0.916 0.909 0.852 0.852 0.906 claude-3-5-sonnet-20241022
5 dis 0.905 0.921 0.859 0.876 0.896 claude-3-7-sonnet-20250219
6 eco 0.900 0.860 0.842 0.871 0.863 claude-3-5-sonnet-20241022
7 eld 0.917 0.899 0.917 0.879 0.903 claude-3-5-sonnet-20241022
8 far 0.905 0.848 0.815 0.860 0.905 claude-3-5-sonnet-20241022
9 fis 0.916 0.895 0.896 0.891 0.905 claude-3-5-sonnet-20241022
10 lan 0.917 0.914 0.884 0.909 0.917 claude-3-5-sonnet-20241022
11 par 0.925 0.905 0.830 0.885 0.922 claude-3-5-sonnet-20241022
12 sub 0.902 0.919 0.851 0.906 0.911 claude-3-7-sonnet-20250219
13 vil 0.881 0.880 0.873 0.895 0.887 gpt-4o-mini

Detailed data

DRI consistency cross 12 climate roles
model survey role dri alpha_c alpha_p alpha_all n
1 claude-3-5-sonnet-20241022 ccps all 0.417 0.991 0.623 0.988 60
2 claude-3-5-sonnet-20241022 ccps coa 0.437 0.792 0.590 0.836 5
3 claude-3-5-sonnet-20241022 ccps csk 0.158 0.768 0.778 0.746 5
4 claude-3-5-sonnet-20241022 ccps ctr 0.380 0.942 0.740 0.934 5
5 claude-3-5-sonnet-20241022 ccps dis 0.468 0.866 0.733 0.868 5
6 claude-3-5-sonnet-20241022 ccps eco 0.340 0.863 0.757 0.898 5
7 claude-3-5-sonnet-20241022 ccps eld 0.322 0.909 0.673 0.901 5
8 claude-3-5-sonnet-20241022 ccps far 0.434 0.901 0.632 0.916 5
9 claude-3-5-sonnet-20241022 ccps fis 0.424 0.941 0.776 0.928 5
10 claude-3-5-sonnet-20241022 ccps lan 0.457 0.933 0.689 0.923 5
11 claude-3-5-sonnet-20241022 ccps par 0.520 0.915 0.728 0.896 5
12 claude-3-5-sonnet-20241022 ccps sub -0.029 0.870 0.798 0.883 5
13 claude-3-5-sonnet-20241022 ccps vil 0.600 0.866 0.791 0.802 5
14 claude-3-5-sonnet-20241022 energy_futures all 0.497 0.989 0.772 0.988 60
15 claude-3-5-sonnet-20241022 energy_futures coa 0.167 0.881 0.771 0.903 5
16 claude-3-5-sonnet-20241022 energy_futures csk 0.869 0.915 0.726 0.919 5
17 claude-3-5-sonnet-20241022 energy_futures ctr -0.023 0.896 0.685 0.885 5
18 claude-3-5-sonnet-20241022 energy_futures dis 0.477 0.922 0.763 0.917 5
19 claude-3-5-sonnet-20241022 energy_futures eco 0.326 0.950 0.679 0.933 5
20 claude-3-5-sonnet-20241022 energy_futures eld 0.246 0.909 0.693 0.929 5
21 claude-3-5-sonnet-20241022 energy_futures far 0.553 0.942 0.767 0.947 5
22 claude-3-5-sonnet-20241022 energy_futures fis 0.436 0.915 0.786 0.935 5
23 claude-3-5-sonnet-20241022 energy_futures lan 0.645 0.951 0.658 0.952 5
24 claude-3-5-sonnet-20241022 energy_futures par 0.535 0.919 0.776 0.939 5
25 claude-3-5-sonnet-20241022 energy_futures sub 0.846 0.922 0.882 0.921 5
26 claude-3-5-sonnet-20241022 energy_futures vil 0.558 0.928 0.517 0.931 5
27 claude-3-5-sonnet-20241022 zh_winterthur all 0.624 0.989 0.780 0.988 60
28 claude-3-5-sonnet-20241022 zh_winterthur coa 0.447 0.916 0.778 0.845 5
29 claude-3-5-sonnet-20241022 zh_winterthur csk 0.601 0.623 0.845 0.820 5
30 claude-3-5-sonnet-20241022 zh_winterthur ctr 0.672 0.912 0.822 0.901 5
31 claude-3-5-sonnet-20241022 zh_winterthur dis 0.484 0.927 0.786 0.914 5
32 claude-3-5-sonnet-20241022 zh_winterthur eco 0.425 0.887 0.855 0.859 5
33 claude-3-5-sonnet-20241022 zh_winterthur eld 0.645 0.933 0.799 0.892 5
34 claude-3-5-sonnet-20241022 zh_winterthur far 0.449 0.870 0.778 0.839 5
35 claude-3-5-sonnet-20241022 zh_winterthur fis 0.631 0.893 0.799 0.900 5
36 claude-3-5-sonnet-20241022 zh_winterthur lan 0.683 0.868 0.799 0.867 5
37 claude-3-5-sonnet-20241022 zh_winterthur par 0.440 0.941 0.850 0.910 5
38 claude-3-5-sonnet-20241022 zh_winterthur sub 0.761 0.913 0.844 0.901 5
39 claude-3-5-sonnet-20241022 zh_winterthur vil 0.584 0.847 0.816 0.865 5
40 claude-3-7-sonnet-20250219 ccps all 0.676 0.990 0.775 0.989 60
41 claude-3-7-sonnet-20250219 ccps coa 0.683 0.874 0.717 0.908 5
42 claude-3-7-sonnet-20250219 ccps csk 0.719 0.813 0.855 0.863 5
43 claude-3-7-sonnet-20250219 ccps ctr 0.769 0.951 0.682 0.944 5
44 claude-3-7-sonnet-20250219 ccps dis 0.544 0.927 0.732 0.916 5
45 claude-3-7-sonnet-20250219 ccps eco 0.867 0.862 0.887 0.873 5
46 claude-3-7-sonnet-20250219 ccps eld 0.576 0.890 0.732 0.899 5
47 claude-3-7-sonnet-20250219 ccps far 0.785 0.759 0.682 0.853 5
48 claude-3-7-sonnet-20250219 ccps fis 0.582 0.899 0.819 0.888 5
49 claude-3-7-sonnet-20250219 ccps lan 0.523 0.917 0.764 0.902 5
50 claude-3-7-sonnet-20250219 ccps par 0.770 0.902 0.682 0.921 5
51 claude-3-7-sonnet-20250219 ccps sub 0.780 0.920 0.682 0.923 5
52 claude-3-7-sonnet-20250219 ccps vil 0.585 0.817 0.798 0.872 5
53 claude-3-7-sonnet-20250219 energy_futures all 0.591 0.988 0.814 0.988 60
54 claude-3-7-sonnet-20250219 energy_futures coa 0.560 0.935 0.741 0.941 5
55 claude-3-7-sonnet-20250219 energy_futures csk 0.801 0.915 0.833 0.947 5
56 claude-3-7-sonnet-20250219 energy_futures ctr 0.420 0.902 0.842 0.929 5
57 claude-3-7-sonnet-20250219 energy_futures dis 0.568 0.911 0.689 0.901 5
58 claude-3-7-sonnet-20250219 energy_futures eco 0.774 0.859 0.706 0.901 5
59 claude-3-7-sonnet-20250219 energy_futures eld 0.288 0.930 0.789 0.942 5
60 claude-3-7-sonnet-20250219 energy_futures far 0.663 0.917 0.889 0.928 5
61 claude-3-7-sonnet-20250219 energy_futures fis 0.563 0.935 0.758 0.942 5
62 claude-3-7-sonnet-20250219 energy_futures lan 0.546 0.863 0.797 0.893 5
63 claude-3-7-sonnet-20250219 energy_futures par 0.813 0.924 0.598 0.921 5
64 claude-3-7-sonnet-20250219 energy_futures sub 0.791 0.936 0.849 0.949 5
65 claude-3-7-sonnet-20250219 energy_futures vil 0.622 0.910 0.798 0.924 5
66 claude-3-7-sonnet-20250219 zh_winterthur all 0.649 0.991 0.787 0.989 60
67 claude-3-7-sonnet-20250219 zh_winterthur coa 0.452 0.945 0.778 0.916 5
68 claude-3-7-sonnet-20250219 zh_winterthur csk 0.797 0.839 0.718 0.841 5
69 claude-3-7-sonnet-20250219 zh_winterthur ctr 0.512 0.874 0.848 0.894 5
70 claude-3-7-sonnet-20250219 zh_winterthur dis 0.504 0.924 0.894 0.880 5
71 claude-3-7-sonnet-20250219 zh_winterthur eco 0.517 0.860 0.939 0.877 5
72 claude-3-7-sonnet-20250219 zh_winterthur eld 0.630 0.875 0.857 0.854 5
73 claude-3-7-sonnet-20250219 zh_winterthur far 0.506 0.866 0.848 0.884 5
74 claude-3-7-sonnet-20250219 zh_winterthur fis 0.633 0.851 0.800 0.910 5
75 claude-3-7-sonnet-20250219 zh_winterthur lan 0.830 0.961 0.816 0.964 5
76 claude-3-7-sonnet-20250219 zh_winterthur par 0.543 0.888 0.833 0.812 5
77 claude-3-7-sonnet-20250219 zh_winterthur sub 0.564 0.902 0.870 0.929 5
78 claude-3-7-sonnet-20250219 zh_winterthur vil 0.606 0.912 0.857 0.914 5
79 gemini-2.5-flash ccps all 0.711 0.982 0.765 0.982 60
80 gemini-2.5-flash ccps coa 0.854 0.750 0.889 0.805 5
81 gemini-2.5-flash ccps csk 0.895 0.606 0.722 0.718 5
82 gemini-2.5-flash ccps ctr 0.784 0.782 0.948 0.839 5
83 gemini-2.5-flash ccps dis 0.942 0.880 0.831 0.889 5
84 gemini-2.5-flash ccps eco 0.826 0.841 0.940 0.872 5
85 gemini-2.5-flash ccps eld 0.848 0.848 0.647 0.876 5
86 gemini-2.5-flash ccps far 0.937 0.702 0.750 0.694 5
87 gemini-2.5-flash ccps fis 0.899 0.874 0.750 0.882 5
88 gemini-2.5-flash ccps lan 0.688 0.833 0.781 0.844 5
89 gemini-2.5-flash ccps par 0.703 0.869 0.706 0.883 5
90 gemini-2.5-flash ccps sub 0.875 0.861 0.844 0.891 5
91 gemini-2.5-flash ccps vil 0.753 0.882 0.725 0.912 5
92 gemini-2.5-flash energy_futures all 0.527 0.981 0.825 0.982 60
93 gemini-2.5-flash energy_futures coa 0.906 0.928 0.625 0.941 5
94 gemini-2.5-flash energy_futures csk 0.809 0.859 0.907 0.902 5
95 gemini-2.5-flash energy_futures ctr 0.620 0.893 0.857 0.915 5
96 gemini-2.5-flash energy_futures dis 0.313 0.895 0.681 0.907 5
97 gemini-2.5-flash energy_futures eco 0.853 0.866 0.753 0.893 5
98 gemini-2.5-flash energy_futures eld 0.771 0.939 0.871 0.954 5
99 gemini-2.5-flash energy_futures far 0.827 0.870 0.821 0.901 5
100 gemini-2.5-flash energy_futures fis 0.486 0.930 0.884 0.923 5
101 gemini-2.5-flash energy_futures lan 0.138 0.934 0.759 0.944 5
102 gemini-2.5-flash energy_futures par 0.220 0.900 0.823 0.910 5
103 gemini-2.5-flash energy_futures sub 0.375 0.850 0.733 0.893 5
104 gemini-2.5-flash energy_futures vil -0.119 0.910 0.892 0.932 5
105 gemini-2.5-flash zh_winterthur all 0.677 0.989 0.814 0.988 60
106 gemini-2.5-flash zh_winterthur coa 0.671 0.868 0.800 0.868 5
107 gemini-2.5-flash zh_winterthur csk 0.922 0.189 0.917 0.830 5
108 gemini-2.5-flash zh_winterthur ctr 0.586 0.880 0.949 0.912 5
109 gemini-2.5-flash zh_winterthur dis 0.451 0.803 0.800 0.885 5
110 gemini-2.5-flash zh_winterthur eco 0.882 0.819 0.750 0.843 5
111 gemini-2.5-flash zh_winterthur eld 0.769 0.962 0.704 0.961 5
112 gemini-2.5-flash zh_winterthur far 0.700 0.871 0.909 0.906 5
113 gemini-2.5-flash zh_winterthur fis 0.671 0.885 0.853 0.925 5
114 gemini-2.5-flash zh_winterthur lan 0.605 0.886 0.825 0.931 5
115 gemini-2.5-flash zh_winterthur par 0.872 0.722 0.840 0.851 5
116 gemini-2.5-flash zh_winterthur sub 0.419 0.843 0.705 0.843 5
117 gemini-2.5-flash zh_winterthur vil 0.588 0.825 0.806 0.863 5
118 gpt-4o-mini ccps all 0.123 0.977 0.699 0.974 60
119 gpt-4o-mini ccps coa 0.313 0.784 0.804 0.851 5
120 gpt-4o-mini ccps csk -0.366 0.911 0.590 0.906 5
121 gpt-4o-mini ccps ctr 0.582 0.791 0.774 0.886 5
122 gpt-4o-mini ccps dis 0.334 0.838 0.740 0.876 5
123 gpt-4o-mini ccps eco 0.490 0.845 0.821 0.905 5
124 gpt-4o-mini ccps eld -0.185 0.873 0.912 0.910 5
125 gpt-4o-mini ccps far -0.197 0.838 0.912 0.901 5
126 gpt-4o-mini ccps fis -0.331 0.878 0.869 0.920 5
127 gpt-4o-mini ccps lan 0.536 0.873 0.689 0.892 5
128 gpt-4o-mini ccps par 0.012 0.875 0.748 0.913 5
129 gpt-4o-mini ccps sub -0.347 0.892 0.919 0.936 5
130 gpt-4o-mini ccps vil -0.327 0.860 0.862 0.920 5
131 gpt-4o-mini energy_futures all -0.010 0.967 0.594 0.960 60
132 gpt-4o-mini energy_futures coa -0.326 0.906 0.808 0.899 5
133 gpt-4o-mini energy_futures csk -0.294 0.859 0.783 0.827 5
134 gpt-4o-mini energy_futures ctr 0.369 0.913 0.654 0.926 5
135 gpt-4o-mini energy_futures dis -0.243 0.920 0.757 0.902 5
136 gpt-4o-mini energy_futures eco -0.498 0.904 0.699 0.915 5
137 gpt-4o-mini energy_futures eld -0.307 0.922 0.786 0.921 5
138 gpt-4o-mini energy_futures far -0.294 0.916 0.799 0.914 5
139 gpt-4o-mini energy_futures fis -0.140 0.929 0.707 0.931 5
140 gpt-4o-mini energy_futures lan -0.082 0.948 0.802 0.957 5
141 gpt-4o-mini energy_futures par -0.219 0.918 0.778 0.914 5
142 gpt-4o-mini energy_futures sub 0.599 0.956 0.753 0.959 5
143 gpt-4o-mini energy_futures vil 0.353 0.934 0.789 0.930 5
144 gpt-4o-mini zh_winterthur all -0.112 0.983 0.629 0.977 60
145 gpt-4o-mini zh_winterthur coa -0.044 0.820 0.677 0.857 5
146 gpt-4o-mini zh_winterthur csk 0.201 0.683 0.773 0.668 5
147 gpt-4o-mini zh_winterthur ctr -0.195 0.852 0.752 0.857 5
148 gpt-4o-mini zh_winterthur dis 0.080 0.869 0.772 0.858 5
149 gpt-4o-mini zh_winterthur eco 0.260 0.865 0.758 0.876 5
150 gpt-4o-mini zh_winterthur eld -0.475 0.841 0.740 0.807 5
151 gpt-4o-mini zh_winterthur far -0.618 0.826 0.774 0.871 5
152 gpt-4o-mini zh_winterthur fis -0.262 0.867 0.898 0.927 5
153 gpt-4o-mini zh_winterthur lan 0.143 0.906 0.893 0.923 5
154 gpt-4o-mini zh_winterthur par -0.644 0.864 0.758 0.848 5
155 gpt-4o-mini zh_winterthur sub -0.293 0.871 0.738 0.877 5
156 gpt-4o-mini zh_winterthur vil -0.782 0.891 0.744 0.907 5
157 grok-3-beta ccps all 0.427 0.990 0.731 0.987 60
158 grok-3-beta ccps coa 0.245 0.862 0.856 0.910 5
159 grok-3-beta ccps csk 0.786 0.900 0.917 0.921 5
160 grok-3-beta ccps ctr 0.223 0.924 0.870 0.940 5
161 grok-3-beta ccps dis 0.237 0.909 0.882 0.923 5
162 grok-3-beta ccps eco 0.666 0.877 0.855 0.859 5
163 grok-3-beta ccps eld 0.304 0.885 0.828 0.883 5
164 grok-3-beta ccps far 0.224 0.872 0.882 0.919 5
165 grok-3-beta ccps fis 0.325 0.885 0.837 0.926 5
166 grok-3-beta ccps lan 0.384 0.923 0.815 0.916 5
167 grok-3-beta ccps par 0.232 0.899 0.882 0.923 5
168 grok-3-beta ccps sub 0.378 0.918 0.853 0.937 5
169 grok-3-beta ccps vil 0.323 0.921 0.837 0.912 5
170 grok-3-beta energy_futures all 0.691 0.985 0.867 0.986 60
171 grok-3-beta energy_futures coa 0.878 0.885 0.760 0.912 5
172 grok-3-beta energy_futures csk 0.797 0.856 0.908 0.900 5
173 grok-3-beta energy_futures ctr 0.278 0.908 0.673 0.922 5
174 grok-3-beta energy_futures dis 0.514 0.949 0.773 0.923 5
175 grok-3-beta energy_futures eco 0.954 0.919 0.529 0.919 5
176 grok-3-beta energy_futures eld 0.805 0.908 0.821 0.902 5
177 grok-3-beta energy_futures far 0.515 0.935 0.754 0.917 5
178 grok-3-beta energy_futures fis 0.834 0.921 0.708 0.928 5
179 grok-3-beta energy_futures lan 0.552 0.927 0.692 0.926 5
180 grok-3-beta energy_futures par 0.901 0.937 0.836 0.942 5
181 grok-3-beta energy_futures sub 0.857 0.910 0.782 0.930 5
182 grok-3-beta energy_futures vil 0.640 0.907 0.367 0.918 5
183 grok-3-beta zh_winterthur all 0.759 0.988 0.855 0.987 60
184 grok-3-beta zh_winterthur coa 0.580 0.926 0.806 0.896 5
185 grok-3-beta zh_winterthur csk 0.801 0.738 0.729 0.835 5
186 grok-3-beta zh_winterthur ctr 0.840 0.885 0.721 0.894 5
187 grok-3-beta zh_winterthur dis 0.614 0.831 0.733 0.883 5
188 grok-3-beta zh_winterthur eco 0.467 0.793 0.763 0.857 5
189 grok-3-beta zh_winterthur eld 0.771 0.914 0.835 0.917 5
190 grok-3-beta zh_winterthur far 0.752 0.907 0.835 0.899 5
191 grok-3-beta zh_winterthur fis 0.647 0.908 0.567 0.925 5
192 grok-3-beta zh_winterthur lan 0.825 0.901 0.868 0.907 5
193 grok-3-beta zh_winterthur par 0.877 0.929 0.781 0.941 5
194 grok-3-beta zh_winterthur sub 0.726 0.904 0.881 0.920 5
195 grok-3-beta zh_winterthur vil 0.878 0.833 0.781 0.910 5

Model/Survey DRI Plots

Survey/Role DRI Plots

Permutation tests

Surveys and Roles: Are models trully consistent across roles?

In this first permutation test, we explore the likelihood that the consistency, measured by DRI, is due to chance.

## Warning: Using `bins = 30` by default. Pick better value with the argument
## `bins`.

Number of significant (p < 0.05) roles across the 3 surveys.
role sig
dis 1
eld 1
far 1
lan 1
sub 1
coa 2
csk 2
ctr 2
fis 2
vil 2
eco 3
par 3
Number of significant (p < 0.05) surveys across the 12 roles
survey sig
ccps 3
zh_winterthur 6
energy_futures 12
Survey/Role Permutation Summary
obs_dri p n min max median iqr mean sd se ci survey role
0.169 0.000 10000 0.116 0.166 0.140 0.010 0.140 0.007 0 0.000 energy_futures coa
0.246 0.000 10000 0.171 0.245 0.209 0.016 0.209 0.011 0 0.000 energy_futures ctr
0.159 0.000 10000 0.055 0.154 0.102 0.023 0.103 0.017 0 0.000 energy_futures eld
0.271 0.000 10000 0.182 0.269 0.222 0.020 0.222 0.014 0 0.000 energy_futures fis
0.341 0.000 10000 0.251 0.337 0.290 0.019 0.291 0.013 0 0.000 energy_futures par
0.512 0.000 10000 0.336 0.506 0.403 0.039 0.405 0.027 0 0.001 energy_futures sub
0.558 0.000 10000 0.305 0.531 0.392 0.050 0.396 0.036 0 0.001 energy_futures csk
0.488 0.000 10000 0.472 0.486 0.478 0.003 0.478 0.002 0 0.000 zh_winterthur eco
0.219 0.000 10000 0.192 0.217 0.205 0.005 0.205 0.004 0 0.000 zh_winterthur par
0.164 0.000 10000 0.122 0.165 0.144 0.009 0.144 0.007 0 0.000 energy_futures dis
0.255 0.000 10000 0.207 0.256 0.233 0.011 0.233 0.008 0 0.000 energy_futures far
0.248 0.000 10000 0.154 0.250 0.206 0.018 0.206 0.013 0 0.000 energy_futures lan
0.254 0.000 10000 0.136 0.258 0.196 0.025 0.196 0.018 0 0.000 energy_futures vil
0.357 0.000 10000 0.285 0.359 0.319 0.019 0.319 0.013 0 0.000 energy_futures eco
0.631 0.001 10000 0.610 0.632 0.619 0.005 0.620 0.004 0 0.000 ccps eco
0.465 0.002 10000 0.361 0.472 0.409 0.029 0.410 0.020 0 0.000 ccps csk
0.279 0.003 10000 0.227 0.284 0.253 0.014 0.254 0.010 0 0.000 zh_winterthur vil
0.429 0.003 10000 0.412 0.432 0.421 0.004 0.421 0.003 0 0.000 ccps par
0.481 0.008 10000 0.435 0.489 0.460 0.012 0.460 0.008 0 0.000 zh_winterthur ctr
0.352 0.010 10000 0.308 0.359 0.332 0.012 0.332 0.008 0 0.000 zh_winterthur fis
0.422 0.020 10000 0.404 0.428 0.416 0.005 0.416 0.003 0 0.000 zh_winterthur coa
0.410 0.056 10000 0.364 0.427 0.392 0.015 0.393 0.010 0 0.000 zh_winterthur sub
0.314 0.099 10000 0.305 0.319 0.311 0.003 0.311 0.002 0 0.000 ccps fis
0.481 0.110 10000 0.468 0.486 0.477 0.004 0.477 0.003 0 0.000 ccps dis
0.631 0.112 10000 0.580 0.662 0.614 0.017 0.615 0.012 0 0.000 zh_winterthur lan
0.316 0.132 10000 0.292 0.328 0.310 0.007 0.310 0.005 0 0.000 ccps sub
0.309 0.150 10000 0.293 0.316 0.305 0.006 0.305 0.004 0 0.000 zh_winterthur far
0.528 0.152 10000 0.511 0.539 0.524 0.006 0.524 0.004 0 0.000 ccps lan
0.434 0.194 10000 0.416 0.445 0.430 0.006 0.430 0.004 0 0.000 zh_winterthur dis
0.647 0.306 10000 0.611 0.691 0.641 0.017 0.642 0.012 0 0.000 zh_winterthur csk
0.402 0.324 10000 0.385 0.418 0.400 0.006 0.400 0.005 0 0.000 ccps eld
0.445 0.349 10000 0.416 0.475 0.439 0.018 0.441 0.011 0 0.000 zh_winterthur eld
0.447 0.388 10000 0.428 0.469 0.445 0.009 0.446 0.006 0 0.000 ccps vil
0.549 0.551 10000 0.538 0.564 0.550 0.005 0.550 0.004 0 0.000 ccps coa
0.569 0.727 10000 0.561 0.585 0.571 0.005 0.572 0.004 0 0.000 ccps ctr
0.439 0.769 10000 0.429 0.457 0.443 0.007 0.443 0.005 0 0.000 ccps far

Models and Surveys: Which models are consistent across roles?

## Warning: Using `bins = 30` by default. Pick better value with the argument
## `bins`.

Survey/Model Permutation Summary
obs_dri p n min max median iqr mean sd se ci survey model
0.417 0 10000 -0.293 0.317 -0.236 0.112 -0.202 0.074 0.001 0.001 ccps claude-3-5-sonnet-20241022
0.676 0 10000 -0.164 0.551 -0.117 0.137 -0.072 0.085 0.001 0.002 ccps claude-3-7-sonnet-20250219
0.427 0 10000 -0.302 0.255 -0.257 0.115 -0.219 0.072 0.001 0.001 ccps grok-3-beta
0.711 0 10000 -0.109 0.403 -0.061 0.130 -0.018 0.081 0.001 0.002 ccps gemini-2.5-flash
0.123 0 10000 -0.376 0.078 -0.269 0.099 -0.261 0.070 0.001 0.001 ccps gpt-4o-mini
0.497 0 10000 -0.302 0.197 -0.228 0.118 -0.195 0.077 0.001 0.002 energy_futures claude-3-5-sonnet-20241022
0.591 0 10000 -0.214 0.294 -0.154 0.123 -0.118 0.081 0.001 0.002 energy_futures claude-3-7-sonnet-20250219
0.691 0 10000 -0.143 0.366 -0.088 0.129 -0.052 0.082 0.001 0.002 energy_futures grok-3-beta
0.527 0 10000 -0.246 0.251 -0.150 0.112 -0.132 0.077 0.001 0.002 energy_futures gemini-2.5-flash
-0.010 0 10000 -0.266 -0.022 -0.190 0.047 -0.187 0.034 0.000 0.001 energy_futures gpt-4o-mini
0.624 0 10000 -0.162 0.455 -0.118 0.124 -0.074 0.080 0.001 0.002 zh_winterthur claude-3-5-sonnet-20241022
0.649 0 10000 -0.131 0.338 -0.081 0.123 -0.040 0.078 0.001 0.002 zh_winterthur claude-3-7-sonnet-20250219
0.759 0 10000 -0.031 0.586 0.008 0.126 0.051 0.079 0.001 0.002 zh_winterthur grok-3-beta
0.677 0 10000 -0.188 0.322 -0.137 0.130 -0.094 0.082 0.001 0.002 zh_winterthur gemini-2.5-flash
-0.112 0 10000 -0.557 -0.119 -0.460 0.074 -0.453 0.054 0.001 0.001 zh_winterthur gpt-4o-mini

All models seem to be consistent across roles. None of the 10,000 permutations led to a higher DRI than the observed DRI, suggesting that the observed value is likely not due to chance.

References